<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
  <head>
    <meta charset="utf-8" />
    <meta name="generator" content="pandoc" />
    <meta
      name="viewport"
      content="width=device-width, initial-scale=1.0, user-scalable=yes"
    />
    <title>bst_heaps</title>
    <style type="text/css">
      code {
        white-space: pre-wrap;
      }
      span.smallcaps {
        font-variant: small-caps;
      }
      span.underline {
        text-decoration: underline;
      }
      div.column {
        display: inline-block;
        vertical-align: top;
        width: 50%;
      }
    </style>
  </head>
  <body>
    <h1 id="lecture-iii-binary-search-trees-and-heaps">
      Lecture III: Binary Search Trees and Heaps
    </h1>
    <ol type="1">
      <li><a href="#Additional-Resources">Additional Resources</a></li>
      <li><a href="#Text-Buffer">Text Buffer</a></li>
      <li><a href="#Append-and-Prepend">Append and Prepend</a></li>
      <li><a href="#Delete-From-Head-or-Tail">Delete From Head or Tail</a></li>
      <li><a href="#Binary-Search-Trees">Binary Search Trees</a></li>
      <li><a href="#Insert-into-BST">Insert into BST</a></li>
      <li><a href="#Heaps">Heaps</a></li>
      <li><a href="#Building-a-Heap">Building a Heap</a></li>
    </ol>
    <p><a href="https://youtu.be/QYddRpvTaFk">CS18 Sean Chen Lecture</a></p>
    <p>
      <a href="https://www.youtube.com/watch?v=B4ijhReCRHw&amp;feature=youtu.be"
        >CS19 Brian Doyle Lecture</a
      >
    </p>
    <h2 id="additional-resources">Additional Resources</h2>
    <p>
      <a href="https://youtu.be/SsRVdvRsNG0">Binary Search Trees - 15min</a>
    </p>
    <p><a href="https://youtu.be/LYWPsV2YQBA">Heaps - 30min</a></p>
    <h2 id="text-buffer">Text Buffer</h2>
    <p>
      Taking a look at our project repo’s
      <a href="bst_heaps.py">text buffer file</a>, let’s start working through
      what it might be expecting and how to build a text buffer.
    </p>
    <p>
      The first thing we’ll notice is that it imports the
      <code>DoublyLinkedList</code> file to use that data structure.
    </p>
    <p>
      Our <code>init</code> function shows us that we can optionally pass the
      string to <code>self.contents</code> if there is a string parameter passed
      in – but there doesn’t need to be. How should we add to this function so
      it passes that string to the content?
    </p>
    <pre><code>def __init__(self, init=None):
    self.contents = DoublyLinkedList()
    # check if an init string is provided
    # if so, put the contents of the init string in self.contents
    if init:
        pass</code></pre>
    <p>
      We’ll want to add this data to the tail of the linked list, but we don’t
      know <em>what</em> a Text Buffer is, so <em>how</em> isn’t certain.
    </p>
    <p>
      A Text Buffer is typically an array that stores each individual character
      of a string – but we’re using a linked list for speed.
    </p>
    <p>
      So, we’ll loop through the string and add each character to the tail of
      the linked list.
    </p>
    <pre><code>def __init__(self, init=None):
    self.contents = DoublyLinkedList()
    # check if an init string is provided
    # if so, put the contents of the init string in self.contents
    if init:
        for character in init:
            self.contents.add_to_tail(character)</code></pre>
    <p>
      We can use the pre-built <code>add_to_tail</code> function from our Doubly
      Linked List file.
    </p>
    <p>
      If we use a test print statement like
      <code>print(f"Adding {character}")</code>, our terminal will print out the
      following based on the tests at the bottom of the file:
    </p>
    <pre><code>Adding S
Adding u
Adding p
Adding e
Adding r
Super</code></pre>
    <p>
      So we know that it’s properly iterating through the list, and that the
      existing <code>str</code> method concatenates them back together properly
      to print a string value for the Text Buffer.
    </p>
    <p>
      We could also test the contents’ length or tail to see if it matches the
      expected output.
    </p>
    <h2 id="append-and-prepend">Append and Prepend</h2>
    <p>Next, let’s work on append and prepend. Append starts out like so:</p>
    <pre><code>def append(self, string_to_add):
    pass</code></pre>
    <p>
      To append to our linked list, we’ll add a string in the same way as
      before. Just loop through the characters and add to the tail.
    </p>
    <pre><code>def append(self, string_to_add):
    for character in string_to_add:
        self.contents.add_to_tail(character)</code></pre>
    <p>
      Since we have identical code in two places, we could make our code more
      DRY by using <code>self.append</code> in our <code>init</code> function
      instead.
    </p>
    <pre><code>if init:
    self.append(init)</code></pre>
    <p>
      Prepend means we will add the string to the front of the text buffer, or
      the head of the list. Which means we need to reverse the character order
      so that the string is being prepending to read in the correct order.
    </p>
    <p>
      We could visualize it like so. If we wanted to add <code>HELLO</code> to
      the existing list that says <code>WORLD</code>:
    </p>
    <blockquote>
      <p>HELLO WORLD</p>
    </blockquote>
    <p>If we tried to add it in order, it would go like this:</p>
    <blockquote>
      <p>
        H WORLD<br />
        EH WORLD<br />
        LEH WORLD<br />
        LLEH WORLD<br />
        OLLEH WORLD
      </p>
    </blockquote>
    <p>
      Since this wouldnt read correctly, we need to reverse the string so it
      instead is added like this:
    </p>
    <blockquote>
      <p>
        O WORLD<br />
        LO WORLD<br />
        LLO WORLD<br />
        ELLO WORLD<br />
        HELLO WORLD
      </p>
    </blockquote>
    <p>We can reverse a string in two ways in Python.</p>
    <p>
      The first option is to use slicing with <code>[::-1]</code>, but this is
      considered an arcane or difficult to read way of writing:
    </p>
    <pre><code>def prepend(self, string_to_add):
    # reverse the incoming string to maintain correct
    # order when adding to the front of the text buffer
    for character in string_to_add[::-1]:
        self.contents.add_to_head(character)</code></pre>
    <p>
      Another method is to use the built in <code>reversed</code> function like
      so:
    </p>
    <pre><code>def prepend(self, string_to_add):
    # reverse the incoming string to maintain correct
    # order when adding to the front of the text buffer
    for character in reversed(string_to_add):
        self.contents.add_to_head(character)</code></pre>
    <p>
      You can read more about the pros and cons of each method
      <a href="https://dbader.org/blog/python-reverse-string">here</a>.
    </p>
    <h2 id="delete-from-head-or-tail">Delete From Head or Tail</h2>
    <p>
      Next we want to delete some number of characters from either the head or
      the tail. Both of these functions will work really similarly.
    </p>
    <p>
      Since we know exactly how <em>many</em> characters to remove, we can call
      our doubly linked list functions that many times using a loop:
    </p>
    <pre><code>def delete_front(self, chars_to_remove):
    for i in range(0, chars_to_remove):
        self.contents.remove_from_head()

def delete_back(self, chars_to_remove):
    for i in range(0, chars_to_remove):
        self.contents.remove_from_tail()</code></pre>
    <h2 id="join">Join</h2>
    <p>The two remaining functions left to write are both join functions.</p>
    <p>
      The first one asks us to join one text buffer to another, by creating a
      concatenated buffer where the head of this starting buffer is at the head
      of the concatenated buffer, and the tail of the buffer being added is the
      tail of the concatenated buffer.
    </p>
    <p>
      The starting functions gives us some hints about how to approach this
      problem:
    </p>
    <pre><code>&quot;&quot;&quot;
Join other_buffer to self
The input buffer gets concatenated to the end of this buffer
The tail of the concatenated buffer will be the tail of the other buffer
The head of the concatenated buffer will be the head of this buffer
&quot;&quot;&quot;
def join(self, other_buffer):
    # we might want to check that other_buffer is indeed a text buffer
    # set self list tail&#39;s next node to be the head of the other buffer

    # set other_buffer head&#39;s prev node to be the tail of this buffer

    pass</code></pre>
    <p>
      The key of concatenating two buffers relies on setting the current tail’s
      next as the head of the other buffer.
    </p>
    <p>
      First, we need to make sure that the passing in
      <code>other_buffer</code> is actually a text buffer.
    </p>
    <p>
      We can use Python’s <code>isInstance</code> to check if both are an
      instance of the same class. Per the docs:
    </p>
    <blockquote>
      <p>
        def isinstance(obj, class_or_tuple)<br />
        Return whether an object is an instance of a class or of a subclass
        thereof.
      </p>
    </blockquote>
    <p>
      So we’ll keep going if it is a TextBuffer or print an error if there it
      isn’t:
    </p>
    <pre><code>        if isinstance(other_buffer, TextBuffer):
            # set self list tail&#39;s next node to be the head of the other buffer
            # set other_buffer head&#39;s prev node to be the tail of this buffer

        else:
            print(&quot;ERROR: Argument is not a TextBuffer&quot;)
            return</code></pre>
    <p>
      We’ll set the tail of the current TextBuffer pointing to the head of the
      other_buffer:
    </p>
    <pre><code>if isinstance(other_buffer, TextBuffer):
    # set self list tail&#39;s next node to be the head of the other buffer
    self.contents.tail.next = other_buffer.contents.head
    # set other_buffer head&#39;s prev node to be the tail of this buffer
    other_buffer.contents.head.prev = self.contents.tail
    # now that both buffers are connected, we will set this buffer&#39;s tail to the other buffer&#39;s tail, to fully concatenate together
    self.contents.tail = other_buffer.contents.tail
    # make sure to fully extend the length to include the other buffer&#39;s length
    self.contents.length += other_buffer.contents.length</code></pre>
    <p>
      We should also check that the <code>other_buffer</code> being passed in
      does in fact have contents.
    </p>
    <pre><code>if(other_buffer.contents.length == 0):
    print(&quot;ERROR: Other buffer is empty!&quot;)
    return</code></pre>
    <p>
      We have to nest that in <em>after</em> checking that
      <code>other_buffer</code> is a buffer, or else there will be no
      <code>.length</code> to check.
    </p>
    <p>
      Lastly we need to handle joining a string by turning it into text buffer
      first.
    </p>
    <p>This function starts out like so:</p>
    <pre><code># if we get fed a string instead of a text buffer instance,
# initialize a new text buffer with this string and then
# call the join method

def join_string(self, string_to_join):
    pass</code></pre>
    <p>
      This can be done very simply with our pre-written functions. We’ll turn
      that string into a TextBuffer instance and then use our join method to
      combine the new text buffer with the current one.
    </p>
    <pre><code>def join_string(self, string_to_join):
    new_buffer = TextBuffer(string_to_join)
    self.join(new_buffer)</code></pre>
    <h2 id="binary-search-trees">Binary Search Trees</h2>
    <p>What is a Binary Search?</p>
    <p>
      It takes a sorted list and compares two pieces of data, then cuts the list
      in half based on what it is searching for. This results in an
      <code>O(log n)</code> run time, which is very good.
    </p>
    <p>
      To use this, it means our data must be sorted first. One way to do that is
      using a tree.
    </p>
    <p>
      Read more
      <a href="https://www.geeksforgeeks.org/binary-search/">here</a> or see
      this helpful
      <a href="https://www.youtube.com/watch?v=qBGLYzFF1aQ"
        >short video visualization</a
      >.
    </p>
    <p>
      <a href="https://www.youtube.com/watch?v=Re-HdpXo1is">This video</a> is a
      brief explanation of iterative searching through a binary tree.
    </p>
    <p>A tree is a node tree that contains “branches” and “leaves”.</p>
    <p>Valid Binary Trees</p>
    <p>
      A valid binary tree requires that all the nodes have only 0, 1 or 2
      children – not more.
    </p>
    <p>
      It also means that all the <em>left</em> children have values
      <em>less than</em> their parents, while all <em>right</em> children have
      values <em>greater than</em> their parents.
    </p>
    <p>In this manner, the tree is pre-sorted.</p>
    <p>
      Looking at the chart above, we can see in <code>a)</code> that 1’s child
      on the left is 25, which is greater than, so that is an invalid binary
      tree.
    </p>
    <p><code>b)</code> only contains one node, so it’s valid.</p>
    <p>
      <code>c)</code> is also an invalid search tree. 22 is valid because 5 is
      on the right and 54 is on the left. 5 looks valid because 77 is on the
      right side of 5…but because it’s larger than 22, it should be on 22’s
      right side.
    </p>
    <p>
      By having 77 and 92 underneath 22 on the left side of the tree, it’s
      invalid because it’s not properly sorted.
    </p>
    <p>
      <code>d)</code> is also invalid because 21 is larger than 17 but on the
      left side.
    </p>
    <p>What would we do if we have 2 identical values in the Binary Tree?</p>
    <p>
      In a simple BST, we could place that either on the left or right hand
      side, depending on how we want to define which is <em>equal to and</em>.
    </p>
    <p>
      But in an ideal solution, we would not have duplicates. Instead, we would
      store the value <em>and</em> the count of nodes with that value in one
      spot.
    </p>
    <p>For example, if we had two 12’s, instead of listing:</p>
    <pre><code>        12
      /    \
     12    45
    /
   6</code></pre>
    <p>We might list like this instead:</p>
    <pre><code>    12 (2)
    /    \
   6     45</code></pre>
    <p>
      This indicates that we have two nodes with the value of 12, but prevents
      any confusion in our sorted tree by having duplicate values in the
      branches.
    </p>
    <p>
      A detailed explanation is
      <a
        href="https://www.geeksforgeeks.org/how-to-handle-duplicates-in-binary-search-tree/"
        >here</a
      >.
    </p>
    <p>
      Some Binary Search Trees also require it to be balanced by having the same
      number of nodes on the right and left sides. Learn more about
      <a href="https://www.geeksforgeeks.org/convert-normal-bst-balanced-bst/"
        >converting to a balanced BST</a
      >.
    </p>
    <p>
      A Binary Tree <em>could</em> work if we swapped convention and put all
      greater than values down the left side, and lesser than values on the
      right side. It would still funciton. But because it breaks convention, it
      would be poorly written code that confuses other devs that work with that
      code.
    </p>
    <p>
      There could be an exception with good reasoning for doing it that way – in
      which case, we should document it clearly.
    </p>
    <p>On the whole though, it’s best to follow convention.</p>
    <p>
      Visualize your Binary Tree by creating it
      <a href="https://www.cs.usfca.edu/~galles/visualization/BST.html"
        >on this website</a
      >
      or also
      <a href="http://btv.melezinek.cz/binary-search-tree.html">this one</a>.
    </p>
    <p>
      See visualization of many BST aspects
      <a href="https://visualgo.net/bn/bst">here</a>.
    </p>
    <h2 id="insert-into-bst">Insert into BST</h2>
    <p>
      Let’s start writing out a class that initializes a Binary Search Tree by
      creating nodes of the tree. We might initialize it like so:
    </p>
    <pre><code>class BinarySearchTreeNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None</code></pre>
    <p>
      This initializes a parent node with some passes in value, that currently
      has no children. So we need to write a method that allows us to insert a
      value into the tree, that will place it into the correct spot, following
      the left-right less-than and greater-than rules.
    </p>
    <pre><code>class BinarySearchTreeNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

    def insert(self, value):
        if value &lt; self.value:
            # go down the left side
            if not self.left:
                # if there is no more left values, place it there
                self.left = BinarySearchTreeNode(value)
            else:
                # continue searching down the left to find an open spot
                self.left.insert(value)
        else:
            # go down the right side
            if not self.right:
                # place the value node here
                self.right = BinarySearchTreeNode(value)
            else:
                # continue searching
                self.right.insert(value)</code></pre>
    <h2 id="heaps">Heaps</h2>
    <p>What is a heap?</p>
    <p>
      It’s a tree that is held inside of an array. Unlike the typical array that
      is arranged by increasing index, a heap is an array that is arranged
      according to parent-child relationships.
    </p>
    <p>
      Read more <a href="https://www.geeksforgeeks.org/binary-heap/">here</a>.
    </p>
    <p>
      There are two types of heaps:
      <a href="https://www.educative.io/edpresso/min-heap-vs-max-heap"
        >min heap and max heap</a
      >.
    </p>
    <p>
      A min heap means that the key value at the root of the tree must be the
      minimum of all the values in the Binary Heap.
    </p>
    <p>
      A max heaps means that the key value at the root of the tree must be the
      <em>maximum</em> value of the complete binary tree.
    </p>
    <p>If we have two numbers, 42 and 10, how would it look in a tree?</p>
    <p>It’ll start out with 42:</p>
    <blockquote>
      <p>[ -inf, 42]</p>
    </blockquote>
    <p>Which in the tree is just:</p>
    <pre><code>    42</code></pre>
    <p>When we add 10, it will be added in at the bottom of the tree:</p>
    <blockquote>
      <p>[ -inf, 42, 10 ]</p>
    </blockquote>
    <pre><code>    42
   /
  10</code></pre>
    <p>
      BUT then it adjusts by switching 10 to the appropriate place in the tree
      for a min heap (remember, the key value at the root is the lowest).
    </p>
    <blockquote>
      <p>[ -inf, 10, 42 ]</p>
    </blockquote>
    <pre><code>    10
   /
  42</code></pre>
    <p>
      This shows that adding to a min heap is a multi-step process that
      initially adds the new value as a node to the bottom of the tree. Then it
      “bubbles up” or sorts itself into place by moving up a generation each
      time.
    </p>
    <p>We can think of this as the parent must be smaller than the child.</p>
    <p>
      Unlike a BST, where nodes can be added at random, in a heap, the nodes are
      filled up in a sequential order, by filling the left most, lowest spot
      first and moving to the right, until an entire generational level is
      filled. (But then swapping nodes until each value is in the right
      location)
    </p>
    <p>Visualize it like this sequence:</p>
    <pre><code>        10
       /  \
      ?    ?
     / \  / \
    ?  ? ?   ?

        10
       /  \
      11   ?
     / \  / \
    ?  ? ?   ?

        10
       /  \
      11  25
     / \  / \
    ?  ? ?   ?

        10
       /  \
      11  25
     / \  / \
   32  ? ?   ?

        10
       /  \
      11  25
     / \  / \
   32  45 ?   ?

        10
       /  \
      11  25
     / \  / \
   32 45 63   ?

        10
       /  \
      11  25
     / \  / \
   32 45 63  77</code></pre>
    <p>Notice how the open spots filled from left to right, row by row.</p>
    <p>
      Then the parent and child swap if the parent is larger than the child.
    </p>
    <p>
      In a max heap, the larger numbers float to the top instead of the smaller.
    </p>
    <p>What would this data structure be useful for?</p>
    <p>
      This could help us sort prices from low to high, or for a priority queue
      on a server by assigning weights to different types of messages.
    </p>
    <h2 id="building-a-heap">Building a Heap</h2>
    <p>
      Let’s try to work with the heap file in our project repo, generic heap.
    </p>
    <p>It’s initialized like so:</p>
    <pre><code>class Heap:
  def __init__(self, comparator):
    self.storage = []
    self.comparator = comparator</code></pre>
    <p>
      Let’s build the bubble up method and assume this is a min heap. We have
      the <em>index</em> though, not the value of the index.
    </p>
    <p>
      If we’re looking at the index, when do we know we’ve hit our base case? If
      our index is 0.
    </p>
    <p>
      How do we compare this index to the parent’s value? Although we’re
      mentally envisioning this as a binary tree, it’s actually stored as a flat
      array.
    </p>
    <p>
      We’ll use
      <a href="https://www.geeksforgeeks.org/binary-heap/">the algorithm</a>
      that returns the parent node: i-1//2
    </p>
    <p>
      The <code>//</code> divisor means that it drops any decimal points and
      returns a floored number (whole integer).
    </p>
    <pre><code>  def _bubble_up(self, index):
    # until we hit the base case
    while index &gt; 0:
        # compare to parent
        parent = (index-1) // 2 #divided by 2

        # if the parent is greater than...
        if self.storage[index] &lt; self.storage[parent]:
            # swap them
            self.storage[index], self.storage[parent] = self.storage[parent], self.storage[index]
            index = parent
        else:
            # leave it where it is
            break</code></pre>
    <p>
      This compares the parent value to the index value. If the parent is
      larger, then we’ll swap them, and update the index to that value of the
      parent.
    </p>
    <p>
      If the parent value is not larger, then we can break out of the loop
      because the index is in the right place.
    </p>
    <p>
      <em
        >Note: the ReadMe actually indicates that the generic heap should be
        able to be a max or min heap</em
      >
    </p>
  </body>
</html>
